Classification Active Learning Based on Mutual Information

نویسندگان

  • Jamshid Sourati
  • Murat Akçakaya
  • Jennifer G. Dy
  • Todd K. Leen
  • Deniz Erdogmus
چکیده

Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence we always aim to minimize the number of required labels. While information theoretical objectives, such as mutual information (MI) between the labels, have been successfully used in sequential querying, it is not straightforward to generalize these objectives to batch mode. This is because evaluation and optimization of functions which are trivial in individual querying settings become intractable for many objectives when we are to select multiple queries. In this paper, we develop a framework, where we propose efficient ways of evaluating and maximizing the MI between labels as an objective for batch mode active learning. Our proposed framework efficiently reduces the computational complexity from an order proportional to the batch size, when no approximation is applied, to the linear cost. The performance of this framework is evaluated using data sets from several fields showing that the proposed framework leads to efficient active learning for most of the data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Classification of Bivariate Distributions Based on Mutual Information

Among all measures of independence between random variables, mutual information is the only one that is based on information theory. Mutual information takes into account of all kinds of dependencies between variables, i.e., both the linear and non-linear dependencies. In this paper we have classified some well-known bivariate distributions into two classes of distributions based on their mutua...

متن کامل

Optimistic Active-Learning Using Mutual Information

An “active learning system” will sequentially decide which unlabeled instance to label, with the goal of efficiently gathering the information necessary to produce a good classifier. Some such systems greedily select the next instance based only on properties of that instance and the few currently labeled points — e.g., selecting the one closest to the current classification boundary. Unfortuna...

متن کامل

Active Instance Sampling via Matrix Partition

Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be f...

متن کامل

On Semi-Supervised Classification

A graph-based prior is proposed for parametric semi-supervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the label...

متن کامل

Hyperspectral segmentation with active learning

This paper introduces a new supervised Bayesian approach to hyperspectral image segmentation, with two main steps: (a) learning, for each class label, the posterior probability distributions, based on a multinomial logistic regression model; (b) segmenting the hyperspectral image, based on the posterior probability distribution learnt in step (a) and on a multi-level logistic prior encoding the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Entropy

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2016